40 research outputs found

    View Selection in Semantic Web Databases

    Get PDF
    We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201

    RDFViewS: A Storage Tuning Wizard for RDF Applications

    Get PDF
    In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of potential view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation

    SPARQL query answering with bitmap indexes

    No full text
    International audienceWhen querying RDF data, one may use reasoning to reach intensional data, i.e., data defined by sets of rules. This is usually achieved through forward chaining, with space and maintenance overheads, or backward chaining, with high query evaluation and optimization costs. Recent approaches rely on pre-computing the terminological closure of the data rather than the full saturation. In this setting, one can even query the data without resorting to backward chaining, using a so-called semantic index. However, these techniques are limited in the type of queries they can support. In this paper, we introduce a data storage technique which mitigates the space issues of forward-chaining. We show that it can also be used with a semantic index. We propose a new structure for the index that relies on bitmaps making it resilient to updates. Our experimental results demonstrate that our storage model significantly reduces the space required to store the data. We show that the indexes can be computed quickly and fit well in memory even for very large ontologies. Finally, we analyze how query answering is affected by the data layout

    Techniques d'optimisation pour des données semi-structurées du web sémantique

    No full text
    Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems.RDF et SPARQL se sont imposĂ©s comme modĂšle de donnĂ©es et langage de requĂȘtes standard pour dĂ©crire et interroger les donnĂ©es sur la Toile. D’importantes quantitĂ©s de donnĂ©es RDF sont dĂ©sormais disponibles, sous forme de jeux de donnĂ©es ou de mĂ©ta-donnĂ©es pour des documents semi-structurĂ©s, en particulier XML. La coexistence et l’interdĂ©pendance grandissantes entre RDF et XML rendent de plus en plus pressant le besoin de reprĂ©senter et interroger ces donnĂ©es conjointement. Bien que de nombreux travaux couvrent la production et la publication, manuelles ou automatiques, d’annotations pour donnĂ©es semi-structurĂ©es, peu de recherches ont Ă©tĂ© consacrĂ©es Ă  l’exploitation de telles donnĂ©es. Cette thĂšse pose les bases de la gestion de donnĂ©es hybrides XML-RDF. Nous prĂ©sentons XR, un modĂšle de donnĂ©es accommodant l’aspect structurel d’XML et la sĂ©mantique de RDF. Le modĂšle est suffisamment gĂ©nĂ©ral pour reprĂ©senter des donnĂ©es indĂ©pendantes ou interconnectĂ©es, pour lesquelles chaque nƓud XML est potentiellement une ressource RDF. Nous introduisons le langage XRQ, qui combine les principales caractĂ©ristiques des langages XQuery et SPARQL. Le langage permet d’interroger la structure des documents ainsi que la sĂ©mantique de leurs annotations, mais aussi de produire des donnĂ©es semi-structurĂ©es annotĂ©es. Nous introduisons le problĂšme de composition de requĂȘtes dans le langage XRQ et Ă©tudions de maniĂšre exhaustive les techniques d’évaluation de requĂȘtes possibles. Nous avons dĂ©veloppĂ© la plateforme XRP, implantant les algorithmes d’évaluation de requĂȘtes dont nous comparons les performances expĂ©rimentalement. Nous prĂ©sentons une application reposant sur cette plateforme pour l’annotation automatique et manuelle de pages trouvĂ©es sur la Toile. Enfin, nous prĂ©sentons une technique pour l’infĂ©rence RDFS dans les systĂšmes de gestion de donnĂ©es RDF (et par extension XR)

    Database techniques for semantics-rich semi-structured Web data

    No full text
    RDF et SPARQL se sont imposĂ©s comme modĂšle de donnĂ©es et langage de requĂȘtes standard pour dĂ©crire et interroger les donnĂ©es sur la Toile. D’importantes quantitĂ©s de donnĂ©es RDF sont dĂ©sormais disponibles, sous forme de jeux de donnĂ©es ou de mĂ©ta-donnĂ©es pour des documents semi-structurĂ©s, en particulier XML. La coexistence et l’interdĂ©pendance grandissantes entre RDF et XML rendent de plus en plus pressant le besoin de reprĂ©senter et interroger ces donnĂ©es conjointement. Bien que de nombreux travaux couvrent la production et la publication, manuelles ou automatiques, d’annotations pour donnĂ©es semi-structurĂ©es, peu de recherches ont Ă©tĂ© consacrĂ©es Ă  l’exploitation de telles donnĂ©es. Cette thĂšse pose les bases de la gestion de donnĂ©es hybrides XML-RDF. Nous prĂ©sentons XR, un modĂšle de donnĂ©es accommodant l’aspect structurel d’XML et la sĂ©mantique de RDF. Le modĂšle est suffisamment gĂ©nĂ©ral pour reprĂ©senter des donnĂ©es indĂ©pendantes ou interconnectĂ©es, pour lesquelles chaque nƓud XML est potentiellement une ressource RDF. Nous introduisons le langage XRQ, qui combine les principales caractĂ©ristiques des langages XQuery et SPARQL. Le langage permet d’interroger la structure des documents ainsi que la sĂ©mantique de leurs annotations, mais aussi de produire des donnĂ©es semi-structurĂ©es annotĂ©es. Nous introduisons le problĂšme de composition de requĂȘtes dans le langage XRQ et Ă©tudions de maniĂšre exhaustive les techniques d’évaluation de requĂȘtes possibles. Nous avons dĂ©veloppĂ© la plateforme XRP, implantant les algorithmes d’évaluation de requĂȘtes dont nous comparons les performances expĂ©rimentalement. Nous prĂ©sentons une application reposant sur cette plateforme pour l’annotation automatique et manuelle de pages trouvĂ©es sur la Toile. Enfin, nous prĂ©sentons une technique pour l’infĂ©rence RDFS dans les systĂšmes de gestion de donnĂ©es RDF (et par extension XR).Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems

    Techniques d'optimisation pour des données semi-structurées du web sémantique

    No full text
    Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems.RDF et SPARQL se sont imposĂ©s comme modĂšle de donnĂ©es et langage de requĂȘtes standard pour dĂ©crire et interroger les donnĂ©es sur la Toile. D’importantes quantitĂ©s de donnĂ©es RDF sont dĂ©sormais disponibles, sous forme de jeux de donnĂ©es ou de mĂ©ta-donnĂ©es pour des documents semi-structurĂ©s, en particulier XML. La coexistence et l’interdĂ©pendance grandissantes entre RDF et XML rendent de plus en plus pressant le besoin de reprĂ©senter et interroger ces donnĂ©es conjointement. Bien que de nombreux travaux couvrent la production et la publication, manuelles ou automatiques, d’annotations pour donnĂ©es semi-structurĂ©es, peu de recherches ont Ă©tĂ© consacrĂ©es Ă  l’exploitation de telles donnĂ©es. Cette thĂšse pose les bases de la gestion de donnĂ©es hybrides XML-RDF. Nous prĂ©sentons XR, un modĂšle de donnĂ©es accommodant l’aspect structurel d’XML et la sĂ©mantique de RDF. Le modĂšle est suffisamment gĂ©nĂ©ral pour reprĂ©senter des donnĂ©es indĂ©pendantes ou interconnectĂ©es, pour lesquelles chaque nƓud XML est potentiellement une ressource RDF. Nous introduisons le langage XRQ, qui combine les principales caractĂ©ristiques des langages XQuery et SPARQL. Le langage permet d’interroger la structure des documents ainsi que la sĂ©mantique de leurs annotations, mais aussi de produire des donnĂ©es semi-structurĂ©es annotĂ©es. Nous introduisons le problĂšme de composition de requĂȘtes dans le langage XRQ et Ă©tudions de maniĂšre exhaustive les techniques d’évaluation de requĂȘtes possibles. Nous avons dĂ©veloppĂ© la plateforme XRP, implantant les algorithmes d’évaluation de requĂȘtes dont nous comparons les performances expĂ©rimentalement. Nous prĂ©sentons une application reposant sur cette plateforme pour l’annotation automatique et manuelle de pages trouvĂ©es sur la Toile. Enfin, nous prĂ©sentons une technique pour l’infĂ©rence RDFS dans les systĂšmes de gestion de donnĂ©es RDF (et par extension XR)

    A Declarative Approach to Data-Driven Fact Checking

    No full text
    Fact checking is an essential part of any investigative work. For linguistic, psychological and social reasons, it is an inherently human task. Yet, modern media make it increasingly difficult for experts to keep up with the pace at which information is produced. Hence, we believe there is value in tools to assist them in this process. Much of the effort on Web data research has been focused on coping with incompleteness and uncertainty. Comparatively, dealing with context has received less attention, although it is crucial in judging the validity of a claim. For instance, what holds true in a US state, might not in its neighbors, e.g., due to obsolete or superseded laws. In this work, we address the problem of checking the validity of claims in multiple contexts. We define a language to represent and query facts across different dimensions. The approach is non-intrusive and allows relatively easy modeling, while capturing incompleteness and uncertainty. We describe the syntax and semantics of the language. We present algorithms to demonstrate its feasibility, and we illustrate its usefulness through examples

    SPARQL query answering with bitmap indexes

    Get PDF
    International audienceWhen querying RDF data, one may use reasoning to reach intensional data, i.e., data defined by sets of rules. This is usually achieved through forward chaining, with space and maintenance overheads, or backward chaining, with high query evaluation and optimization costs. Recent approaches rely on pre-computing the terminological closure of the data rather than the full saturation. In this setting, one can even query the data without resorting to backward chaining, using a so-called semantic index. However, these techniques are limited in the type of queries they can support. In this paper, we introduce a data storage technique which mitigates the space issues of forward-chaining. We show that it can also be used with a semantic index. We propose a new structure for the index that relies on bitmaps making it resilient to updates. Our experimental results demonstrate that our storage model significantly reduces the space required to store the data. We show that the indexes can be computed quickly and fit well in memory even for very large ontologies. Finally, we analyze how query answering is affected by the data layout
    corecore